On learning hierarchical classifications
نویسندگان
چکیده
Many significant real-world classification tasks involve a large number of categories which are arranged in a hierarchical structure; for example, classifying documents into subject categories under the library of congress scheme, or classifying world-wide-web documents into topic hierarchies. We investigate the potential benefits of using a given hierarchy over base classes to learn accurate multi-category classifiers for these domains. First, we consider the possibility of exploiting a class hierarchy as prior knowledge that can help one learn a more accurate classifier. We explore the benefits of learning categorydiscriminants in a “hard” top-down fashion and compare this to a “soft” approach which shares training data among sibling categories. In doing so, we verify that hierarchies have the potential to improve prediction accuracy. But we argue that the reasons for this can be subtle. Sometimes, the improvement is only because using a hierarchy happens to constrain the expressiveness of a hypothesis class in an appropriate manner. However, various controlled experiments show that in other cases the performance advantage associated with using a hierarchy really does seem to be due to the “prior knowledge” it encodes.
منابع مشابه
Making Explicit the Hidden Semantics of Hierarchical Classifications
Hierarchical classifications are concept hierarchies used to organize large amounts of documents. File systems, products’ taxonomies for the market place and the directories provided by Web portals are common examples of hierarchical classifications. As semi-structured knowledge sources, hierarchical classifications have peculiar features: they differ both from plain texts since they are based ...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملImproving the Performance of the HONG Network with Boosting
This paper gives a brief description of a hierarchical architecture (HONG) that has been described elsewhere. The learning algorithm it uses is a mixed unsupervised/supervised method with most of the learning being unsupervised. The architecture generates multiple classifications for every data pattern presented, and combines them to obtain the final classification. The main purpose of this pap...
متن کاملApplied Multi-Layer Clustering to the Diagnosis of Complex Agro-Systems
In many fields, such as medical, environmental, a lot of data are produced every day. In many cases, the task of machine learning is to analyze these data composed of very heterogeneous types of features. We developed in previous work a classification method based on fuzzy logic, capable of processing three types of features (data): qualitative, quantitative, and more recently intervals. We pro...
متن کاملHierarchical overlapped SOM's for pattern classification
We develop a multilayer overlapped self-organizing maps (SOM's) with limited structure adaptation capabilities, and associated learning scheme for labeled pattern classification applications. The learning algorithm consists of the standard unsupervised SOM learning of synaptic weights as well as the supervised learning vector quantization (LVQ) 2 learning. As higher layer SOM's overlap, the fin...
متن کاملBridging the semantic gap for software effort estimation by hierarchical feature selection techniques
Software project management is one of the significant activates in the software development process. Software Development Effort Estimation (SDEE) is a challenging task in the software project management. SDEE is an old activity in computer industry from 1940s and has been reviewed several times. A SDEE model is appropriate if it provides the accuracy and confidence simultaneously before softwa...
متن کامل